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@ Interactive computer system recognizing spoken commands. 



@ An interactive computer system having a processor executing a target computer program, and having a 
speech recognizer for converting an utterance into a command signal for the target computer program. The 
target-computer- program. has a series of- active-program -states occumng over a series-of tima periods^ At least a 
first active-state image is displayed for a first active state occurring during a first time period. At least one object 
displayed in the first active-state image is identified, and a list of one or more first active-state commands 
identifying functions which can be performed in the first active state of the target computer program is generated 
from the identified object. A first active-state vocabulary of acoustic command models for the first active state 
comprises the acoustic command models from a system vocabulary representing the first active-state com- 
mands. A speech recognizer measures the value of at least one feature of an utterance during each of a series 
^ of successive time intervals within the first time period to produce a series of feature signals. The measured 
^ feature signals are compared to each of the acoustic command models in the first active-state vocabulary to 
generate a match score for the utterance and each acoustic command model. The sF>eech recognizer outputs a 
JJJ command signal corresponding to the command model from the first active-state vocabulary having the best 
match score. 
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Background of the Invention 

— The- invention relates to interactive computer systems in whictv a user provides conrwnands to a target 
computer program executing on the computer system by way of an input device. The input device may be, 

5 for example, a keyboard, a mouse device, or a speech recognizer. For each input device, an input signal 
generated by the Input device is translated into a form usable by the target computer program. 

An interactive computer system in which the user can provide commands by speaking the commands 
may consist of a processor executing a target computer program having commands identifying functions 
which can be performed by the target computer program. The computer system further includes a speech 

10 recognizer for recognizing the spoken commands and for oulputting command signals corresponding to the 
recognized commands. The speech recognizer recognizes a spoken command by measuring the value of at 
least one feature of an utterance during each of a series of successive time intervals to produce a series of 
feature signals, comparing the measured featured signals to each of a plurality of acoustic command 
models to generate a match score for the utterance and each acoustic command model, and outputting a 

75 command signal corresponding^ to 4he-comraandTnodel having the best match-score, - 

The set of utterance models and words represented by the utterance models which the speech 
recognizer can recognize Is referred to as the system vocabulary. The system vocabulary Is finite and may, 
for example, range from one utterance model to thousands of utterance models. Each utterance model may 
represent one word, or may represent a combination of two or more words spoken continuously (without a 

20 pause between the words). 

The system vocabulary may contain, for example, utterance models of all of the commands to which 
the target computer program is capable of responding. However, as the number of utterance models 
increases, the time required to perform utterance recognition using the entire system vocabulary increases, 
and the recognition accuracy decreases. 

25 Generally, a target computer program has a series of active states occurring over a series of time 
periods. For each active state, there may be a list of active state commands identifying functions which can 
be performed in the active state. The active state commands may be a small subset of the system 
vocabulary. The translation of an uttered command to a form usable by the target computer program in one 
slate of the target computer program may be different from the translation of the same command in another 

30 state of the target computer program. 

In order to improve the speed and accuracy of the speech recognizer, it Is desirable to restrict the 
active vocabulary of utterance models which the speech recognizer can recognize in any given time period 
to the active state commands identifying functions which can be performed by the target computer program 
in that time period. To attempt to achieve this result, the speech recognizer may be provided with a finite 

35 state machine which duplicates the active states and transitions between active states of the target 
computer program. 

In practice, it has been found impossible to build a finite state machine for the speech recognizer which 
exactly duplicates the active states and transitions between active states of the target computer program. 
The target computer program not only Interacts with the user, but also interacts with data and other devices 

40 of the computer system whose states cannot be known In advance. 

For example, a command to load a file will cause a computer program to make a transition to one state 
If the file exists, or to a different state if the file does not exist. However, the speech recognizer finite state 
machine must be built with some assumption that the file exists or does not exist. If a command to load a 
file is spoken to the computer program using the speech recognizer, then the speech recognizer finite state 

45 machine may or may not track the computer program state correctly, depending on whether that file exists 
or does not exist. If the speech recognizer finite state machine assumes that the file exists, but in fact the 
file does not exist, then the speech recognizer state machine will enter a state different from the state of the 
target computer program. As a result, the target computer program can no longer receive valid input from 
the speech recognizer. 

50 

Summary of the Invention 

It is an object of the Invention to provide an interactive computer system having a target computer 
program having a series of active program states occurring over a series of time periods, and having a 
55 speech recognizer in which the active vocabulary of commands recognized by the speech recognizer in 
any given time period Is restricted to a list of active commands identifying functions which can be 
performed by the target computer program in that given time period, without having to predict in advance 
the states and transitions between states of the target computer program which will occur under all possible 
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circumstances. 

According to the invention, an interactive computer system comprises a processor executing a target 
computer program having a-series of active program states occurring over a series of time periods. The 
target computer program generates active state image data signals representing an active state image for 
5 the active state of the target computer program occurring during each time period. Each active state image 
contains one or more objects. 

The interactive computer system further comprises means for displaying at least a first active-state 
image for a first active state occurring during a first time period. Means are provided for identifying at least 
one object displayed in the first active-state image, and for generating from the identified object a list of one 
10 or more first active-state commands identifying functions which can be performed in the first active state of 
the target computer program. 

Means are also provided for storing a system vocabulary of acoustic command models. Each acoustic 
command model represents one or more series of acoustic feature values representing an utterance of one 
or more words associated with the acoustic command model. The system further includes means for 
75 identifying -a -first -active-state- vocatsulary of -acoustic command-models -for -the -first-active state. -The first 
active-state vocabulary comprises the acoustic command models from the system vocabulary representing 
the first active-state commands. 

The interactive computer system comprises a speech recognizer for measuring the value of at least one 
feature of an utterance during each of a series of successive time intervals within the first time period to 
20 produce a series of feature signals. The speech recognizer compares the measured feature signals to each 
of the acoustic command models in the first active-state vocabulary to generate a match score for the 
utterance and each acoustic command model. The speech recognizer then outputs a command signal 
corresponding to the command model from the first active-state vocabulary having the best match score. 

The first active-state vocabulary preferably comprises substantially less than all the acoustic command 
25 models from the system vocabulary. The speech recognizer does not compare the measured feature 
signals for the first time period to any acoustic command model which is not in the first active-state 
vocabulary. . 

In one embodiment of the interactive computer system according to the invention, the display means 
displays at least a second active-state image different from the first active-state image for a second active 
30 state occurring during a second time period different from the first time period. The object identifying 
means identifies at least one object displayed in the second active-state image, and generates a list of one 
or more second active-state commands identifying functions which can be perfomned in the second active 
state of the target computer program. 

The active-state vocabulary identifying means identifies a second active-state vocabulary of acoustic 
35 command models for the second active state. The second active-state vocabulary comprises the acoustic 
command models from the system vocabulary representing the second active-state commands. The second 
active-state vocabulary is at least partly different from the first active-state vocabulary. 

The speech recognizer measures the value of at least one feature of an utterance during each of a 
series of successive time intervals within the second time period to produce a series of feature signals. The 
40 speech recognizer compares the measured feature signals for the second time period to each of the 
acoustic command models in the second active-state vocabulary to generate a match score for the 
utterance and each acoustic command model. The speech recognizer then outputs a command signal 
corresponding to the command model from the second active-state vocabulary having the best match 
score. 

45 The target computer program may, for example, have only one active state occurring during each time 
period. The target computer program may comprise an operating system program alone, an application 
program and an operating system program combined, or two or more application programs and an 
operating system program. 

At least some of the commands for an active-state identify functions which can be performed on the 
50 identified objects in the active-state image for the state. 

The identified object in an active-state image may. for example, comprise one or more of a character, a 
word, an icon, a button, a scroll bar, a slider, a list box, a menu, a checfc box, a container, or a notet)ook. 

In an alternative embodiment of the invention, the speech recognizer may output two or more command 
signals corresponding to the command models from the active-state vocabulary having the best match 
55 scores for a given time period. 

The vocabulary of acoustic command models for each active state may further comprise a set of global 
acoustic command models representing global commands identifying functions which can be performed in 
each active state of the target computer program. 

4 




EP 0 621 531 A1 



The display means may comprise, for example, a cathode ray tube display, a liquid crystal display, or a 
printer. 

The^ display -means may display both an aetive-slate image for an active state occurring during a time 
period, and at least a portion of one or more images for program states not occurring during the time 
5 period. 

A method of computer interaction according to the invention comprises executing, on a processor, a 
target computer program having a series of active program states occurring over a series of time periods. 
The target computer program generates active state image data signals representing an active state image 
for the active state of the target computer program occurring during-each time period; Each active state 
70 image contains one or more objects. The method further comprises displaying at least a first active-state 
image for a first active state occurring during a first time period. At least one object displayed in the first 
active-state image is identified, and a list of one or more first active-state commands identifying functions 
which can be performed in the first active state of the target computer program is generated from the 
identified object. 

75 A-system vocabulary of -acoustic command models is- -stored. -Each acoustic -command- model repre^ 

sents one or more series of acoustic feature values representing an utterance of one or more words 
associated with the acoustic command model. A first active-state vocabulary of acoustic command models 
for the first active state is identified. The first active-state vocabulary comprises the acoustic command 
models from the system vocabulary representing the first active-state commands. 

20 The value of at least one feature of an utterance is measured during eaqh of a series of successive time 
intervals within the first time period to produce a series of feature signals. The measured feature signals are 
compared to each of the acoustic command models in the first active-state vocabulary to generate a match 
score for the utterance and each acoustic command model. A command signal corresponding to the 
command model from the first active state vocabulary having the best match score is output. 

25 By identifying at least one object displayed in the active-state image of the target computer program, 
and by generating from the identified object a list of one or more active-state commands identifying 
functions which can be performed in the active state of the target computer program, the active-state 
vocabulary of the speech recognizer can be limited to a small subset of the system vocabulary representing 
active-state commands, without having to predict in advance the states and transitions t)etween states of the 

30 target computer program which will occur under all possible circumstances. 

Brief Description of the Drawing 

Figure 1 is a block diagram of an example of an interactive computer system according to the invention. 
35 Figure 2 shows an example of a first active-state image for a first active state of a target computer 
program. 

Figure 3 is a block diagram of an example of a speech recognizer for an interactive computer system 
according to the invention. 

Figure 4 shows an example of a second active-state image for a second active state of a target 
40 computer program. 

Figure 5 is a block diagram of an example of an acoustic command model store for the system 
vocabulary of an interactive computer system according to the invention. 

Figure 6 is a block diagram of an acoustic processor for the speech recognizer of Figure 3. 

Figure 7 schematically shows an example of an acoustic command model. 
45 Figure 8 schematically shows an example of an acoustic model of a phoneme for constructing an 
acoustic command model. 

Figure 9 schematically shows an example of paths through the acoustic model of Figure 7. 

Description of the Preferred Embodiments 

50 

Figure 1 is a block diagram of an example of an interactive computer system according to the invention. 
The interactive computer system comprises a processor 10 executing a target computer program having a 
series of active program states occurring over a series of time periods. The target computer program 
generates active stale image data signals representing an active state image for the active state of the 
55 target computer program occurring during each time period. Each active state image contains one or more 
objects. 

The processor may be, for example, a personal computer, a computer work station, or any other 
microcomputer, minicomputer, or main frame computer. 
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The target computer program may be an operating system program such as DOS, Microsoft Windows 
(trademark), OS/2 (trademark). AIX (trademark), UNIX (trademark), X-Windows. or any other operating 
system. The target computer program may comprise -one or more application programs executing^ with an 
operating system program. Application programs include spreadsheet programs, word processing pro- 
5 grams, database programs, educational programs, recreational programs, communication programs, and 
many more. 

Objects in an active-state image may comprise one or more of a character, a word, an icon, a button, a 
scroll bar, a slider, a list box. a menu, a check box, a container, a notebook, or some other items. 

The interactive computer system further comprises display means 1 2- for- displaying at least a first 
70 active-state image for a first active state occurring during a first time period. The display means may be. for 
example, a cathode ray tube display, a liquid crystal display, or a printer. 

Figure 2 shows an example of a hypothetical first active-state image for a first active state occuning 
during a first time period. In this example, the active-state image includes a frame object 14 containing a 
title bar object 16, a menu bar object 18, a list box object 20, and a push button object 22. The menu bar 
15 object -18 includes an "jtems- object,- an -"options^- object, and- an- "exit" object- The list box object 20 
includes a vertical scroll bar object 24. and "blue", "green", "red", "orange", "black", "white", and "purple" 
objects. In the list box 20. only the "blue", "green", "red", "orange", and "black" objects are shown in 
Figure 2. The "white" and "purple" objects are contained in the list box and could be made visible by 
scrolling with the vertical scroll bar 24. 
20 The active state image data signals may be generated by the target computer program, for example, by 
using operating system interrupts, function calls, or. application program interface calls. 

Example I, below, illustrates C programming language source code for creating active state image data 
signals. 

Returning to Figure 1 , the interactive computer system further comprises an image object identifier 26 
25 for identifying at least one object displayed in the first active-state image, and for generating from the 
identified object a list of one or more first active-state commands identifying functions which can be 
performed in the first active-state of the target computer program. 

The image object Identifier 26 may comprise computer program subroutines designed to intercept 
(hook) operating system function calls, and application program interface calls provided by one or more 
30 target computer programs, and/or may comprise computer program subroutines for using operating system 
interrupts, function calls, or application program interface calls for identifying objects displayed in the first 
active-state image of the target computer program. Example II, below, illustrates C programming language 
source code for identifying at least one object displayed in an active state image. 

Table 1 shows a hypothetical example of a list of first active-state commands identifying functions 
05 which can be performed in the first active-state of the target computer program for the objects displayed in 
the first active-state image of Figure 2. 
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TABLE 1 



OBJECT 



SPOKEN 

connAKD 



mCTION 



FRAME 



10 



15 



TITLE BAR 
NEKU BAR 

"ITEMS" 
"COLORS" 
"NAMES" 
"ADDRESSES'* 

"OPTIOHS" 

"EXIT" 

S7STEH HENU 



FRAME CHANGES THE FOCOS TO THE ENTIRE FRAME 

TOP BORDER IDENTIFIES FHAMK ELEMENT TO BE HOVED 
BOTTOM BORDER IDENTIFIES FRAME ELEMENT 10 BE MOVED 

LEFT BORDER IDENTIFIES FRAME ELEMENT TO BE MOVED 

RIGHT BORDER IDENTIFIES FRAME ELEMENT TO BE MOVED 

LEFT MOVES FRAME OR FRAME ELEMENT LEFT 

RIGHT MOVES FRAME OR FRAME ELEMENT RIGHT 

UP MOVES FRAME OR FRAME ELEMENT UP 

DOWN MOVES FRAME OR FRAME ELEMENT DOWN 

NONE NONE 

CLOSE MENU HIDES THE MENU 

MENU CHANCES THE FOCUS TO THE MENU BAR 

SELECT SELECTS THE ITEM AT THE CURSOR 

ITEMS ACTIVATES THE "ITEMS" MENU 

COLORS ACTIVATES THE "COLORS" MENU 

NAMES ACTIVATES THE "NAMES" MENU 

ADDRESSES ACTIVATES THE "ADDRESSES" MENU 

OPTIONS ACTIVATES A DIALOG TO SELECT OPTIONS 

EXIT EXITS THE CURRENT PROGRAM STATE 

CANCEL DISMISSES THE POP-UP MENU 

CLOSE MENU HIDES THE MENU 

MENU CHANGES THE FOCUS TO ANOTHER MEND, IF ANY 

SELECT SELECTS THE ITEM AT THE CURSOR 

RESTORE RESTORES WINDOW TO PREVIOUS SIZE AND POSITION 

MINIMIZE REDUCES WINDOW TO SMALLEST SIZE 

MAXIMIZE INCREASES WINDOW TO LARGEST SIZE 

CLOSE EXITS THE CURRENT PROGRAM STATE 

WINDOW LIST DISPLAYS A LIST OF RUNNING PROGRAMS 
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OBJECT 


SPOKEN 
COMMAND 


TABLE 1 fCOMTINUED-) 
FUNCTION 


5 


VERTICAL 










crpOl T BAR 


SETS roCDS ON THP VIIOIT BIB 






UP 


MOVES THE LIST BOX UP THROUGH THE 








SUBJECT BEING DISPLAYED 






DOWN 


MOVES THE LIST BOX DOHH THPOtinTl TWIT 








SUBJECT BEING DISPLACED 


10 




TOP 


MOVES THE LIST BOX TO THE TOP OF THE 








SUBJECT BEING DISPLAYED 






BOTT0I1 


MOVES THE LIST BOX TO THE BOTTOM OF 








THE SUBJECT BEING DISPLAYED 






PAGE UP 


MOVES THE LIST BOX UP ONE PAGE THROUGH 








THE SUBJECT BEING DISPLAYED 






PAGE DOWN 


MOVES THE LIST BOX DOtfN ONE PAGE 








THROUGH THE SUBJECT. BEING DISPLAYED 




PUSH BUTTON 


PRESS 


EXECUTES THE PUSH BUTTON 






PUSH BUTTON 


EXECUTES THE PUSH BUTTON 




"HELP" 


HELP 


EXECUTES THE HELP FACILITY 


20 


LIST BOX 


LIST BOX 


CHANGES THE FOCUS TO LIST BOX 




"BLUE" 


BLUE 


SELECTS THE NAMED COLOR 




"GHEEM" 


GREEN 


SELECTS THE NAMED COLOR 






RED 


SELECTS THE NAMED COLOR 




"ORANGE" 


ORANGE 


SELECTS THE NAMED COLOR 




"BLACK" 


BLACK 


SELECTS THE NAMED COLOR 


25 


'"WHITE" 


WHITE 


SELECTS THE NAMED COLOR 




"PURPLE" 


PURPLE 


SELECTS THE NAMED COLOR 



As shown in the example of Table 1, each object may have zero or more commands identifying 
30 functions which can be performed in the first active state of the target computer program. At least some 
commands identify functions which can bo performed on the identified object in the active-state image for 
the State. For example, the command "FRAME" changes the focus to the entire frame object 14 of Rgure 
2. With the focus on the entire frame object 14, the spoken command "LEFT" operates on the frame object 
by moving it to the left on the display screen. 
35 Returning again to Figure 1, the interactive computer system comprises a system acoustic command 
model vocabulary store 28 for storing a system vocabulary of acoustic command models. Each acoustic 
command model represents one or more series of acoustic feature values representing an utterance of one 
or more words associated with the acoustic command model. 

' The stored acoustic command models may t>e, for example, Markov models or other dynamic 
40 programming models. The parameters of the acoustic command models may be estimated from a known 
uttered training text (for example, 257 sentences) by, for example, smoothing parameters obtained by the 
forward-backward algorithm. (See, for example, Jelinek. "Continuous Speech Recognition By Statistical 
Methods", Proceedings of the IEEE , Volume 64. No. 4, April 1976 pages 532-536.) 

Preferably, each acoustic command model represents a command spoken in isolation (that is, independent 
45 of the context of prior and subsequent utterances). Context-Independent acoustic command models can be 
produced, for example, either manually from models of phonemes or automatically, for example, by the 
method described by Lalit R. Bahl et al in U.S. Patent 4,759,068 entitled "Constructing Markov Models of 
Words from Multiple Utterances", or by any other known method of generating context-independent 
models. 

50 Alternatively, context-dependent models may be produced from context-independent models by group- 
ing utterances of a command into context-dependent categories. A context can be, for example, manually 
selected, or automatically selected by tagging each feature signal corresponding to a command with its 
context, and by grouping the feature signals according to their context to optimize a selected evaluation 
function. (See, for example, Lalit R. Bahl et al, "Apparatus And Method Of Grouping Utterances Of A 

55 Phoneme Into Context-Dependent Categories Based On Sound-Similarity For Automatic Speech Recogni- 
tion." U.S. Patent 5.195.167.) 

As shown in the block diagram of Figure 1, the interactive computer system comprises an active-state 
command model vocabulary identifier 30 for identifying a first active-state vocabulary of acoustic command 
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models for the first active state. The first active-state vocabulary comprises the acoustic command models 
from the system vocabulary 28 representing the first active-state commands from the image object identifier 
26. Example below, - illustrates C programming language source- code- for identifying an active-state 
vocabulary. Example IV, below, illustrates C programming language source code for defining the active- 
5 state vocabulary to the speech recognizer. 

Preferably, the active-state vocabulary comprises substantially less than all of the acoustic command 
models in the system vocabulary. For example, each active-state vocabulary may comprise 50 to 200 
commands. The entire system command vocabulary may comprise 500 to 700 or more commands. The 
speech recognizer does not compare tha measured feature signals for a tinrie period to any - acoustic 
10 command model which is not In the active-state vocabulary for that time period. 

A speech recognizer 32 measures the value of at least one feature of an utterance during each of a 
series of successive time intervals within the first time period to produce a series of feature signals. The 
speech recognizer 32 compares the measured feature signal to each of the acoustic command models in 
the first active-state vocabulary to generate a match score for the utterance and each acoustic command 
15 models The speech- recognizer -32 outputs-a command ^nal correspooding to 4he-comnrtand model -from 
the first active-state vocabulary having the best match score. 

Example V. below, illustrates C programming language source code for outputting a command signal 
corresponding to the command model from an active-state vocabulary having the best match score. 

Figure 3 is a block diagram of an example of a speech recognizer for an Interactive computer system 
20 according to the invention. In this example » the speech recognizer 32 comprises an active-state acoustic 
command model store 34 for storing the active-state vocabulary comprising the acoustic command models 
from the system vocabulary store 28 representing the active-state commands identified in active state 
command model vocabulary identifier 30. 

The speech recognizer 32 further comprises an acoustic processor 36 for measuring the value of at 
25 least one feature of an utterance during each of a series of successive time intervals within each active- 
state time period to produce a series of feature signals. An acoustic match score processor 38 compares 
the measured feature signals from acoustic processor 36 to each of the acoustic command models In the 
active-state acoustic command models store 34 to generate a match score for the utterance and each 
acoustic command model. An output 40 outputs one or more command signals corresponding to the 
30 command models from the active state vocabulary having the best match scores for a given time period. 

Preferably, only one command signal corresponding to the command model from the first active-state 
vocabulary having the best match score is output. In this case, tfie one output command may be 
immediately executed. If two or more command signals corresponding to the command models from the 
active-state vocabulary having the best match scores for a given time period are output, then the 
35 recognized commands may be displayed for the user to select one for execution. 

The speech recognizer may be a publicly available product such as the IBM Voice Type II (frademark) 
or the IBM Speech Server Series (trademark). In products containing a fast acoustic match and a detailed 
acoustic match, t>oth acoustic matches may be used in the invention. Alternatively, since the Image object 
identifier 26 and the active state command model vocabulary identifier 30 select only a small subset of the 
40 system vocabulary in store 28 for the acoustic match, the fast acoustic match can be omitted. 

In speech recognition products containing a language model, the language model can be omitted. 
Alternatively, all of the words in the active-state vocabulary can be assigned equal language model 
probabilities. 

In speech recognizer products having hypothesis search algorithms for generating multiple-word 
45 hypotheses, the recognition of a word is dependent in part on the recognition of successive words. Such a 
hypothesis search algorithm need not be used with the present invention in which, preferably, each 
command is independent of successive commands. 

Preferably, both the target computer program and the speech recognizer are executed on the same 
central processing unit in a time sharing manner. Alternatively, the target computer program and the speech 
50 recognizer can be executed on different central processing units, for example using a client-server 
architecture. 

In the interactive computer system according to the invention, tiie display means may further display at 
least a second active-state image different from the first active-state image for a second active state 
occurring during a second time period different from the first time period. 
55 Figure 4 shows an example of a second active-state image for a second active state of the target 
computer program. The second active-state image shown in Figure 4 contains a frame object 42, a titie bar 
object 44, a system menu object 46, a vertical scroll bar object 48. a horizontal scroll bar object 50. and a 
container object 52. The container object 52 contains an "editor" object, a "phone book" object, a 
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"spreadsheet" object, a "mail" object, and a "solitaire" object. 

The object identifying means identifies at least one object displayed in the second active-state image, 
and generates from the identified object a list of one or-m or o s e cond a ct i v e- stat e commands identifying 
functions which can be performed in the second active-state of the target computer program. 
5 Table 2 is an example of a hypothetical list of commands for each object shown In Figure 4 identifying 
functions which can be performed in the second active-state of the target computer program. 
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TABLE Z 




SPOKEN 




OBJECT 


connAND 


FUNCTION 


FKANE 


FRAME 


CHANCES THE FOCUS TO THE ENTIRE FRAME 




rOr BORDER 


IDENTIFIES FRAME ELEMENT TO BE MOVED 




BOTTOM BORDER 


IDENTIFIES FRAME ELEMENT TO BE MOVED 




LEFT BORDER 


IDENTIFIES FRAME ELEMENT TO BE MOVED 




RICHT BORDER 


IDENTIFIES FfiAHE ElEHSNT TO BE MOVED 




LEFT 


MOVES FRAME OR FRAME ELEMENT LEFT 




RIGHT 


MOVES FRAME OR FRAME ELEMENT RICHT 




OP 


MOVES FRAME OR FRAME ELEMENT UP 




DOWK 


MOVES FRAME OR FRAME ELEMENT DOWN 


TITLE BAB 


NONE 


NONE 


SYSTEM MEKU 


CLOSE MEND 






MENU 


CHANGES THE FOCUS TO ANOTHER MENU, IF AHY 




SEL£CT 


SELECTS THE ITEM AT THE CURSOR 




RESTORE 


RESTORES WINDOW TO PREVIOUS SIZE AND POSITION 




MINIMIZE 


REDUCES tflHDOtf TO SMALLEST SIZE 




MAXIMIZE 


INCREASES WINDOW TO LARGEST SIZE 




CLOSE 


EXITS THE CURKEtrr PROGRAM STATE 




WINDOW LIST 


?????r??SJl JLIST OF RTOHIHG PROGRAMS 


VERTICAL 






SCROLL BAR 


SCROLL BAR 


SETS FOCUS ON THE NEin: SCROLL BAR 




UP 


MOVES THE COOTAINER UP IHROUGH THE 






SUBJECT BEING DISPUTED 




DOWN 


MOVES THE COHTAINER DOWN THRODGH THE 






SUBJECT BEIETG DISFL&7ED 




TOP 


MOVES THE OMITAINER TO THE TOP OF THE 






SUBJECT BBIHG DISPLAZED 




BOTTOM 


MOVES THE COmiHER TO THE BOTTOM OF 






THE SUBJECT BEING DISPLAYED 




PAGE UP 


MOVES TEE COHTAINER UP ONE PAGE THROUGH 






THE SUBJECT BEING DISPLAYED 




PAGE DOWN 


MOVES THE CONTAINER Dim ONE PAGE 






THROUGH THE SUBJECT BEING DISPLAYED 
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TABLE 2 (CONTINUED) 




SPOKElf 






LunriANii 


FUNCTION 


HORIZONTAL 






SCROLL BAR 


SCROLL BAR 


SETS FOCOS OH THE NEXT SCROLL BAR 




LEFT 


MOVES THE CONTAINER LEFT THROUGH THE 






SUBJECT BEING DISPLAYED 






MOVES THE CONTAINER RIGHT THROUGH THE 






SUBJECT BEING DISPLAYED 




EXTREME LEFT 


MOVES THE CONTAINER TO THE KmEWE 






LEFT OF THE SUBJECT BEING DISPLAYED 




EXTREME RIGHT 


MOVES THE CONTAINER TO THE EXTREME 






RIGHT OF THE SUBJECT BEING DISPLAYED 




PAGE LEFT 


MOVES THE CONTAINER LEFT ONE PAGE 






ToXuUGn THE SUBJECT BEING DISPLAYED 




PAGE RIGHT 


MOVES THE CONTAINER RIGHT ONE PAGE 






THROUGH THE SUBJECT BEING DISPLAYED 


fCMT A T WT7D 


CONTAINSR 


CHANCES THE FOCUS TO THE CONTAINER 




SELECT ALL 


EXECUTES ALL PROGRAMS IK THE CONTAINER 


EDITOR 




exKcurts the editor program 


PHONE BOOK 


PHOHE BOOK 


EXECUTES THE PHOHE BOOK PROGRAM 


SPREADSHEET 


SPREADSHEET 


EXECUTES THE SPREADSHEET PROGRAM 


HAIL 


MAIL 


EXECUTES THE MAIL PROGRAM 


SOLITAIRE 


SOLITAIRE 


EXECUTES THE SOLITAIRE PROGRAM 
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Comparing Figures 2 and 4, the first active-state image differs from tlie second activerstate image by 
providing menu bar object 18, list box object 20, and push button object 22 in the first active-state image 
but not in the second active-state image. The horizontal scroll bar 50, and the editor, phone book, mail, 
Spreadsheet, and solitaire objects are provided in the second active-state image, but not in the first active- 
state image. 

The active-state vocabulary Identifying means further identifies a second active-state vocabulary of 
acoustic command models for the second active state. The second active state vocabulary comprises the 
acoustic command models from the system vocabulary representing the second active-state commands. 
The second active-state vocabulary is at least partly different from the first active-state vocabulary. 

Comparing Tables 1 and 2, the first active-state vocabulary comprises the spoken commands listed in 
Table 1. The second active-state vocabulary comprises the spoken commands listed in Table 2. In this 
example, the first active-state vocabulary is at least partly different from the second active-state vocabulary 
as shown therein. 

The speech recognizer measures the value of at least one feature of an utterance during each of a 
series of successive time intervals within the second time period to produce a series of feature signals. The 
speech recognizer compares the measured feature signals for the second time period to each of the 
acoustic command models in the second active-state vocabulary to generate a match score for the 
utterance and each acoustic command model. The speech recognizer outputs a command signal cor- 
responding to the command model from the second active-state vocabulary having the best match score. 

Preferably, the target computer program has only one active state occurring during each time period. 

Figure 5 is a block diagram of an example of the acoustic command model vocabulary store 28 of 
Figure 1. The system vocabulary may comprise, for example, a set of global acoustic command models 
representing global commands identifying functions which can be performed in every active state of the 
target computer program. 

Table 3 lists some examples of global commands represented by global acoustic command models. 
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TABLE 3 



Global Commands 


Spoken Command 


Function 


MICROPHONE OFF 

ENTER 

LEFT 

RIGHT 

PASTE 

WINDOW LIST 
EDITOR 
DESK top 


turns the microphone off 

sends "ENTER" keystroke to keyboard input buffer 
sends "LEFT ARROW" keystroke to keyboard input buffer 
sends "RIGHT ARROW" keystroke to keyboard input buffer 
inserts contents of clipboard into application with the focus 
displays a list of running programs 
executes the editor program 
makes the desk top window active 



75 - 

The system vocabulary may further comprise object type acoustic command models associated with 
different types of objects. For example, as shown in Tables 1 and 2, frame object type acoustic commands 
include "frame", "top border", "bottom border", "left border", "right border", "left", "right", "up", and 
"down". Vertical scroll bar object type acoustic commands include "scroll bar", "up", "down", "top", 
20 "bottom", "page up", "page down". Push button object type acoustic command models Include "press" 
and "push button". 

Finally, the system vocabulary includes application-specific acoustic command models representing 
application-specific objects. In the examples of Tables 1 and 2, application-specific objects include the 
words "items", "colors", "names", "addresses", "phone book", "spreadsheet", "mail" and "solitaire", 

25 The display means 1 2 of Rgure 1 may display both an active-state Image for an active state occurring 
during a time period, and at least a portion of one or more images for program states not occurring during 
the time period. . . 

One example of the acoustic processor 36 of Rgure 3 is shown in Figure 6. The acoustic processor 
comprises a microphone 54 for generating an analog electrical signal corresponding to the utterance. The 

30 analog electrical signal from microphone 54 is converted to a digital electrical signal by analog to digital 
converter 56. For this purpose, the analog signal may be sampled, for example, at a rate of twenty kilohertz 
by the analog to digital converter 56. 

A window generator 58 obtains, for example, a twenty millisecond duration sample of the digital signal 
from analog to digital converter 56 every ten milliseconds (one centisecond). Each twenty millisecond 

35 sample of the digital signal is analyzed by spectrum analyzer 60 in order to obtain the amplitude of the 
digital signal sample In each of, for example, twenty frequency bands. Preferably, spectrum analyser 60 
also generates a twenty-first dimension signal representing the total amplitude or total power of the twenty 
millisecond digital signal sample. The spectrum analyzer 60 may be, for example, a fast Fourier transform 
processor. Alternatively, it may be a bank of twenty band pass filters. 

40 The twenty-one dimension vector signals produced by spectrum analyzer 60 may be adapted to 
remove background noise by an adaptive noise cancellation processor 62. Noise cancellation processor 62 
subtracts a noise vector N(t) from the feature vector F(t) input into the noise cancellation processor to 
produce an output feature vector F'{t). The noise cancellation processor 62 adapts to changing noise levels 
by periodically updating the noise vector N(t) whenever the prior feature vector F{t-1) is identified as noise 

45 or silence. The noise vector N(t) is updated according to the formula 



(i+A) 



where N(t) is the noise vector at time t. N(t-1) is the noise vector at time (t-1). k is a fixed parameter of the 
adaptive noise cancellation model, F(f-1) is the feature vector input into the noise cancellation processor 
62 at time (t-1) and which represents noise or silence, and Fp(t-1) is one silence or noise prototype vector, 
55 from store 64, closest to feature vector F{t•^ ). 

The prior feature vector F(t-1) is recognized as noise or silence if either (a) the total energy of the 
vector is below a threshold, or (b) the closest prototype vector in adaptation prototype vector store 66 to the 
feature vector is a prototype representing noise or silence. For the purpose of the analysis of the total 
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energy of the feature vector, the threshold may be, for example, the fifth percentile of all feature vectors 
(corresponding to both speech and silence) produced In the two seconds prior to the feature vector being 
evaluated. 

After noise cancellation, the feature vector F'{t) is normalized to adjust for variations in the loudness of 
5 the input speech by short term mean normalization processor 68. Normalization processor 68 normalizes 
the twenty-one dimension feature vector F'(f) to produce a twenty dimension normalized feature vector X(t). 
The twenty-first dimension of the feature vector F'(f), representing the total amplitude or total power, is 
discarded. Each component i of the normalized feature vector X(t) at time t may, for example, be given by 
the equation 

10 

XM = FM-Z{f} [2] 

in the logarithmic domain, where F'((t) is the i-th component of the un normalized vector at time t, and where 
Z(t) is a weighted mean of the components of F'{f) and Z(t-1) according to Equations 3 and 4: 

75 

Z(f) = 0.9Z(M) + 0.1Af(f) [3] 

and where 

20 



25 The normalized twenty dimension feature vector X(t) may be further processed by an adaptive labeler 70 to 
adapt to variations in pronunciation of speech sounds. An adapted twenty dimension feature vector X{t) is 
generated by subtracting. a twenty dimension adaptation vector A(t) from the twenty dimension feature 
vector X(t) provided to the input of the adaptive labeler 70. The adaptation vector A(t) at time t may, for 
example, be given by the formula 

30 

(YT^) , [5] 

35 

where k is a fixed parameter of the adaptive labeling model, X(M ) is the normalized twenty dimension 
vector input to the adaptive labeler 70 at time (t-1), Xp(t-1) is the adaptation prototype vector (from 
adaptation prototype store 66) closest to the twenty dimension feature vector X(M) at time (t-1), and A(t-1) 
is the adaptation vector at time (t-1). 
40 The twenty dimension adapted feature vector signal X'(t) from the adaptive labeler 70 is preferably 
provided to an auditory model 72. Auditory model 72 may, for example, provide a model of how the human 
auditory system perceives sound signals. An example of an auditory model is described in U.S. Patent 
4,980,918 to Bahl et al entitled "Speech Recognition System with Efficient Storage and Rapid Assembly of 
Phonological Graphs 

45 Preferably, according to the present invention, for each frequency band i of the adapted feature vector 
signal X'(t) at time t, the auditory model 72 calculates a new parameter £/(f) according to Equations 6 and 

7: 

EKf) = /<'i+AC2(X7r))(A/KM)) [6] 

50 

where 

W/(t) = Ar3xWKM)-FXM) [7] 

55 and where K\,K2, and are fixed parameters of the auditory model. 

For each cenlisecond time interval, the output of the auditory model 72 is a modified twenty dimension 
feature vector signal. This feature vector is augmented by a twenty-first dimension having a value equal to 
the square root of the sum of the squares of the values of the other twenty dimensions. 
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For each centisecond time interval, a concatenator 74 preferably concatenates nine twenty-one 
dimension feature vectors representing the one current centisecond time interval, the four preceding 
centisecond time interval, and -the four following centisecond time intervals-ta form a single spliced vector 
of 189 dimensions. Each 189 dimension spliced vector is preferably multiplied in a rotator 76 by a rotation 
5 matrix to rotate the spliced vector and to reduce the spliced vector to fifty dimensions. 

The rotation matrix used in rotator 76 may be obtained, for example, by classifying into M classes a set 
of 189 dimension spliced vectors obtained during a training session. The covariance matrix for all of the 
spliced vectors in the training set is mufti plied by the inverse of the within-class covariance matrix for all of 
the spliced vectors in ali M classes^ The first f^fty eigenvectors of the resulting nr>atrix form the rotation 
70 matrix. (See, for example, "Vector Quantization Procedure For Speech Recognition Systems Using Discrete 
Parameter Phoneme-Based Markov Word Models" by L. R. Bahl. et al. IBM Technical Disclosure Bulletin, 
Volume 32, No. 7, December 1989, pages 320 and 321.) 

Window generator 58, spectrum analyzer 60, adaptive noise cancellation processor 62, short term mean 
normalization processor 68, adaptive labeler 70. auditory model 72, concatenator 74, and rotator 76, may be 

75 suitably programmed -Special purpose oi^ general purpose digital signal-processors. Prototype 5tores-64 and 
66 may be electronic computer memory of the types discussed above. 

The prototype vectors in prototype store 54 may be obtained, for example, by clustering feature vector 
signals from a training set into a plurality of clusters, and then calculating the mean and standard deviation 
for each cluster to form the parameter values of the prototype vector. When the training script comprises a 

20 series of word-segment models (fonnning a model of a series of words), and each word-segment model 
comprises a series of elementary models having specified locations in the word-segment models, the 
feature vector signals may be clustered by specifying that each cluster con^esponds to a single elementary 
model in a single location in a single word-segment model. Such a method is descrit)ed in more detail in 
U.S. Patent Application Serial No. 730,714, filed on July 16, 1991, entitled "Fast Algorithm for Deriving 

25 Acoustic Prototypes for Automatic Speech Recognition." 

Alternatively, all acoustic feature vectors generated by the utterance of a training text and which correspond 
to a given elementary model may be clustered by K-means Euclidean clustering or K-means Gaussian 
clustering, or both. Such a method is described, for example, by Bahl et al in U.S. Patent 5,182,773 entitled 
"Speaker-Independent Lab»el Coding Apparatus". 

30 Figure 7 schematically shows a hypothetical example of an acoustic command model. The hypothetical 
model shown in Figure 7 has a starting state SI, an ending state S4, and a plurality of paths from the 
starting state S1 to the ending state S4. 

Figure 8 schematically shows a hypothetical example of an acoustic Markov model of a phoneme. In 
this example, the acoustic phoneme model comprises three occurrences of transition T1 , four occurrences 

35 of transition T2, and three occurrences of transition T3. The transitions shown in dotted lines are null 
transitions. 

Each solid-line transition in the acoustic models of Figures 7 and 8 has at least one model output 
comprising an acoustic feature value. Each model output has an output probability. Each null transition has 
no output. Each solid line transition and each dotted line transition from a state has a probability of 

40 occurrence when the model is In that state. 

Figure 9 shows a hypothetical example of paths through the acoustic mode! of Figure 7. The match 
score for an utterance and an acoustic command model is the sum of the probabilities of the measured 
features of the utterance for all paths through the acoustic command model. For each path, the probability 
of the measured features of the utterance is equal to the product of the probabilities of the transitions along 

45 the path times the probabilities of the measured features at each transition along the path. 

Preferably, the interactive computer system according to the invention may be made by suitably 
programming a general purpose digital computer system. More specifically, the processor 10, the image 
object identifier 26, and the active-state command model vocabulary identifier 30 may be made by suitably 
programming a general purpose digital processor. The system acoustic command model vocabulary store 

50 28 and the active-state acoustic command models store 34 may be electronic computer memory. The 
display means 12 may comprise a video display such as a cathode ray tube, a liquid crystal display, or a 
printer. 

As mentioned above, the target computer program may be one or more application programs and an 
operating system program. For example, the target computer program may be IBM OS/2 (trademark) 
55 version 2.0, and Presentation Manager (trademark). 

IBM's OS/2 version 2.0 operating system and Presentation Manager have application program interface 
calls in various languages, including the C programming language, the assembly programming language, 
and the REXX programming language. The complete collection of application program interface calls is part 
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of the OS/2 2.0 Technical Library. The syntax for the application program interface calls in a language Is 
compatible with how standard calls operate In the language. The name of a particular application program 
interface call may be different for different languages. Also, some aspects of the application program 
Interface in one language may not be supported from another language. 

5 For the C programming language, the application program interface consists of many library calls. C 
programming language source code be compiled with the IBM C Set/2 compiler. 

Examples I to V illustrate C programming language source code for OS/2 and Presentation Manager for 
(a) creating and displaying an image, (b) reading the active state image to identify at least one object 
displayed in the active state image, (g) creating the vocabulary from the active state Image, {d) defining the 

70 vocabulary to the speech recognizer, and (e) outputting a command signal corresponding to the command 
model from an active-state vocabulary having the best match score. 

Example I 

75 Example-I illustrates C programming language source code- for creating- the -tiypotbeticat first active^ 

state Image shown in Figure 2. 

There is a concept of a "standard window" In OS/2 and Presentation Manager. A standard window is a 

combination of several commonly-used windows. In Rgure 2, the frame window, title bar, system menu and 

menu bar can be considered to be part of a standard window. The standard window is created with the 
20 following 0 programming language source code using the OS/2 application program interface call 

WinCreateStdwindowQ. The comments following the double slashes (//) describe the operation of the source 

code. 
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#define INCL_WIN // 

// 

#include <os2.h> // 

// 



Required to get Presentation 
Manager definitions. 
Required to get Presentation 
Manager definitions. 



// Prototype definition for window 

// procedure. ' - 

MRESULT EXPENTRY SainpleProc( HWND hwnd, ULONG ulMsg, MPARAM mpl, 

MPARAM mp2 ); 



HWND hwndFrame; // This is a variable to hold a "handle" 

^5 // to a frame window. A window handle is 

// unique for each window. 
HWND hwndClient; // This is a variable to hold a "handle" 

// to a client window. 
20 ULONG ulFlags; // This is a variable for the frame data 

// to be used at creation. 
HAB hAB; // A Presentation Manager anchor block 

// handle... not important for this 
25 // example. It's a handle which is 

// received during initialization and 
// used when terminating. 
HMQ hMQ; // A message queue. Presentation Manager 

// uses this to send messages to the 

30 

// application windows. 



// All applications must make this call 
// to initialize Presentation Manager. 
hAB = Winlnitiali2e(0) ; 



// Create a message c[ueue for 
// Presentation Manager to use. The 
40 // second parameter means to take the 

// default size of message queue. 
hMQ - WinCreateMsgQueue( hAB^ 0 ) ; 

^ // Register the class of our client 

// window. This specifies a function 
// which Presentation Manager will use 
// to send messages of events that the 
// window would like to know about. Some 
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10 



J5 



20 



25 



// messages are WM_SIZE which tells the 
// window that its size is changing^ 
// WM_CREATE which tells a window that It 
// is being created, and WM_BUTTONlDOWN 
// which tells when a mouse button has 
// been clicked in the window. 



// The arguments for WinRegisterClassC ) 
// 



// 
// 
// 
// 
// 
// 
// 
// 
// 
// 



hAB 



"Generic' 



SampleProc 
OL 



WinRegisterClass( hAB, 

"Generic", 
SampleProc, 
OL, 
Oh ); 



the handle received from 
Winlnitialize( ) . 

the name of our window class. This 
string will be used to create a window 
of our type. 

the name of our window procedure as 
defined with the above prototype, 
class style. . .none 
Amount of special storage reserved 
for application's use... none. 
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// Set up the frame creation data to 
// specify some of the specific windows 
// desired. 

ulFlags = FCF_TITLEBAR | FCF_SYSMENU J FCF_BORDER; 
// The arguments for WinCreateStdWindow() : 



// 








// 


HWNDJJESKTOP 


- the parent window. Make the frame 


to 


// 




be the child of the Presentation 




// 




Manager " desk" ^op"; 




// 


OL 


- frame style. . .none 




// 


ulFlags 


- frame creation flags 




// 


"Generic" 


- our previously registered window 




// 




procedure. 




// 


"Title" 


- title to be in title bar. 




// 


OL 


- client window style... none 




// 


NULLHANDLE 


- implies that frame resources, such 


as 


// 




the menu bar description are 




// 




compiled into the resultant EXE using 


// 




the resource compiler that is part 


of 
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// the OS/2 Toolkit for its application 

// program interface. 

// 10 - ID of the resources in the EXE. 

// &hwndClient - pass the address of the client window 
// handle so that the application program 

// interface can copy back the newly 

// created client handle. 

// 

hwndFrame = WinCreateStdWindow( HWND_DESKTOP , 

SulFlags, 
"Generic", 
"Title", 
OL, 

NULLHANDLE, 
10, 

20 &hwndClient ) ; 

// Size and position the fraune on the 
// screen, and make it visible with 
^ // WinSetWindowPosO , 

// The arguments for WinSetWindowPos ( ) : 

// 

// hwndFrame - handle to our frame for which we want 
30 // to set the size and position. 

// HWND_TOP - set the frame above all other frames 

// so that it can be seen and used. 

// 10, 20 - the desired position (x, y ). 

// 300, 500 - the desired size (width, height). 

// SWP_. , . - flags telling Presentation Manager to 

// process the size, move the window, 

// and show it. 

// 

40 WinSetWindowPos ( hwndFrame, 

HWND_TOP, 
10, 20, 
300, 500, 

^. SWP_SI2E I SWP_MOVE | SWP„SHOW ) ; 



// Presentation Manager is a message based system and 
50 // during the create call, a WM_CREATE message is sent to 

// the above-registered window procedure. The other child 
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// windows are created while processing this message. Thi 
// is depicted below: 

MRESULT EXPENTRY SampleProc{ HWND hwndClient, ULONG ulMsg, 

MPARAM mpl , MPARAM mp2 ) ; 

HWND hwndList; 
HWND hwndButton; 

switch( ulMsg ) 
( 



case WM_CREATE: 

// We are processing the WM_CREATE 

// message for. .our client window which is 

// just being created. The passed window 

// handle, hwndClient, will be returned 

// via the last parameter in the 

// WinCreateStdWindowO call. 

// Now create the child list box, 



// The argtunents for WinCreateWindow ( ) : 



// 








// 


hwndClient 




set the parent to be 


// 






the client window.. 


// 


WC_LISTBOX 




window class. This is 


// 






a list box. 


// 


fi n 




no title text 


// 






associated with the 


// 






list box. 


// 


WS_. . . 




window styles make a 


// 






visible pushbutton. 


// 


0, 0 




initial coordinates at 


// 






which to place window - 


// 


50, 30 




initial size of 


// 






window . 


// 


hwndClient 




set the owner to be 


// 






the client window. 


// 


HWND_TOP 




place this window 


// 






above all others. 


// 


ID_BUTTON 




window id. 
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// NULL - no control data 

// NULL - no presentation 

// parameters 



' // 

hwndLlst = WinCreateWindow( hwndClie^it^ 

WC_LISTBOX, 

,Q WS^VISIBLE 1 LS_MULTIPLESEL, 

0. 0, 
50, 30, 
hwndClient, 
HWND_TOP, 
ID^LISOBOX, 
NULL, 
NULL ); 

20 // The arguments for WinCreateWindow( ) 

// are the same as above, with the 
// exceptions that there are different 
// window styles for the button class ^ 
// there is a different class name, the 

^ // ID is different, and the button 

// has meaningful text. 
// 

hwndButton = WinCreateWindow( hwndClient, 
30 WCJUTTON, 

"Help", 

WS^VISIBLE I BS_PUSHBUTTON, 
0, 70, 
100, 250, 

^® hwndClient, 

HWND.TOP, 
ID„BUTTON, 
NULL, 

40 NULL ); 



^ break; 



50 



// Finished processing the message. 
// Return control to Presentation 
// Manager. 



55 



return ( FALSE ); 



20 



EP 0 621 531 A1 



Example II 

Example II illustrates C programming language source code for reading an active-state image. 

Presentation Manager provides an application program interface call for any application to put a "hook" 
5 into the queues of messages which are passed back and forth between windows. A hook is installed with a 
call back function which gets called with every message which is sent. Call back functions for hooks must 
reside in a presentation manager dynamic link library. The required procedure is to load the dynamic link 
library which contains the call back function and then load the hook. 

10 HMODULE Ym; //A handle for a loaded dynamic link library 

// a frame window. A window handle is \inique 
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// This is the function prototype for the 
// callback. —It - follows the syntax for a 
// SendMsgHook as described in the IBM 
// Presentation Manager Programming Reference, 
// Volume III, 

VOID EXPENTRY CallbackProc( HAB hAB, PSMHSTRUCT- pSmh, 

BOOL bTask ); 

//To load the dynamic link library with the callback 
// function use DosLoadModule( ) . 

// The arguments for DoaLoadModule{ ) are the followina- 

// 

// NULL - no buffer to return error information 



35 



40 



45 



50 



55 



21 




EP 0 621 531 A1 



70 



75 



20 



26 



35 



// 0 - length of buffer 

// "MYDLL" - name of DLL to load 
// &hm - address where to return the module 

// handle 
DosLoadModule( NULL, 
0, 

"MYDLL". 
&hni ) ; 



// Now set the hook. The arguments for WinSetHook() are as 
// follows: 
// 

// hfiB afichor" block" handle" rece±ved-f rem 

// Presentation Manager initialization. 

// NULLHANDLE - hook the Presentation Manager system 
// queue 

// HK_SEND - install a hook for sent messages 

// CallbackProc - callback procedure from the loaded 
// dynamic link library 

// hm - handle to the loaded module 

// 

WlnSetHook( hAB, 
hMQ,_ 

hkHsendmsg, ' 

( PFN ) CallbackProc , 
30 hm ) ; 

// With the hook installed the call back routine will get 
// called every time a message is sent in Presentation 
// Manager. One message that contains information that a 
// new image (window) is active is WTl^SETFOCUS . It can 
// be processed a follows to get the frame window which is 
// active. 

^ VOID EXPENTRY CallbackProc ( HAB hAB, PSMHSTRUCT pSmh, BOOL bTask ) 
{ 

// Declaring some variables. 

HWND hwndWithFocus; 
HWND hwndFrame; 

HWND hwndParent; ----- 
HWND hwndDesktop; . 

if (pSmh->msg ~ WM_SETFOCUS) 

50 I 

// The call back has been called 
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// with a WM_SETFOCUS Message. 

// Unpack the message's second 
// parameter. This tells if the 
// message is for a window 
// receiving or losing focus. 

if (SH0RTlFR0MMP(pSmh->mp2) ) 

{ 

// This window is receiving the 
// focus. 
hwndWithFocus = pSrah->hwnd; 



7S 



// This may be a child window of 
// an actual image becoming 
// active. Get the absolute 
// parent which is a frame. Look 
^ // until we've reached the 

// Presentation Manager deisk top 
// which is the root of all 
// visible windows. 



25 



30 



40 



// Get the desk top handle as a 
// comparison for the limit. 

hwndDesktop = WinQueryDesktopWindow ( hAB, NULLHANDLE ); 

hwndParent = hwndWithFocus; 



// Loop to find the last parent 
// in the window chain, 
while ( hwndParent != hwndDesktop ) 

35 { 

hwndFrame = hwndParent; 

// Query for the next parent. 
hwndParent = WlnQueryWindow( hwndFrame, QW^PARENT ); 

1 



// At this point hwndFrame is the frame for the active 
// image! 



50 



Example IH 

55 

Example III Illustrates C programming language source code for identifying the list of active-state 
commands from the active-state image. 
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The procedure for creating the list of active-state commands from the image is as follows. (1) Create a 
list of ail the windows which are children (direct or indirect) of the active frame found above. (2) Identify all 
windows in the list by their window class. (3) For windows from window classes which display text to the 
user, query all the window text (hidden and visible). (4) (Combine a global list of words with a standard list of 
words for each window type and with the words which were queried from the application in step (3). 

Step (4) merely involves combining multiple arrays of words into one array of words. Therefore, source 
code for Step (4) is not illustrated. 

// Step (1) Create a list of all the windows which are 

// children (direct or indirect) of the 

// active frame fovind above . 

// Assume t:hat we won't have more than 100 child 

// windows. 



HWND AllWindowsYlOO"; // Declare an array to hold the 

// window handles. 

int index = 0; // Index at which to put windows 

// into the AllWindowsY" array. 

HWND hwndFrame; // Assume to be initialized to 

// the active window in the 
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// CallbackProcO as outlined 
// above. 

// Use a recursive function to get all children. 

// Call it initially with the frame: 

// 

// FindChildren( hwndFrarae ); 



VOID FindChildren( HWND hwndParent ) 
{ 

HE^^JM hwndList; 
HWND hwndChlld; 



// Put this window on the list. Increment the index 
// to point to the next available slot in the array. 

AllWindows-? index " = hwndChild; 
index = index + 1; 



// Initiate an enumeration of the immediate child 
// windows. An enumeration handle, hwndList, Is 
// returned. It is used to sequentially access all 
// the child windows. 

hwndList = WinBeginEnumWindows ( hwndParent ); 



// Loop through all the children until the enumeration 
// returns a 0 window handle which means that there 
// are no more windows. 

while ( hwndChild = WinGetNextWindow( hwndList ) ) 

// For each window call this function again to get all 
// the children of THIS window. 

FindChildren( hwndChild ); 

} 

// End the enumeration. 



25 



EP 0 621 531 A1 



70 



20 



25 



WinEndEnuinWindows( hwndList ); 

1 



// Step (2) Identify all windows in the list by their 
// window class. 

// For each window in the list, get its type. 

int i; // counting index 

CHAR szBuf f erY200" ; // buffer to get class name 

int BufSize = sizeof (szBuf f er ) ; 

HWND hwnd; 



for (i = 0; i < index; i++ ) 
[ 

hwnd = AllWindows^ i "; 

// This next function returns the class name as a 
// string in the buffer which is passed as an 
// argument. 

WinQueryClassName( hwnd, BufSize, szBuffer ); 



// Here are some class names defined in Presentation 
30 // Manager as generic windows. The actual strings are 

// enclosed in quotes, following C programming 
// language string conventions. 
// 

// "ttl" a frame window 

// "#3" a button 

// "#4" a menu 

// "#7" a list box 

// "ttS** a scroll bar 

} " " ' " 



// Step (3) For windows from window classes which 
// display text to the user, query all the 

// window text (hidden and visible). 

// In this code sample it is shown how to read text 
// displayed by an application. 

// - Assume that no text is longer than 200 bytes 
// for this example. 

// - Assume that pBuffer is pointing to a buffer of 
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// shared memory which has been given to the 

// process in which the window resides. 

// - Assume that classname has been filled with the 
// class name of the object as described in (2) 

// above . 



CHAR classname^lOO" ; 
CHAR *pBuffer; " 
int BufSize = 201; 
int Li stboxCount ; 
int i; 

// Get application text for list boxes and buttons, 

if (strcmp( classname, "#3" ) == 0) 
{ 

// This is a button. Get its text. 
WinQueryWindowText( hwndButton, BufSize, pBuffer ); 

if (strcmp( classname, "#7" ) == 0) 
{ 

// This is a list box. Loop through all of the items 
// to get all the text. Interfacing with the list box 
// requires the Presentation Manager application 
// program interface call WinSendMsg( ) . It always has 
// the same 4 parameters. 



// - window handle 

// - message 

// " message-specific parameter or 0 

// " message-specific parameter or 0 



LlstboxCount = WinSendNsg( hwndListbox, LM_QUERYITEMCOUNT, 

0, 0 ); 

// Here's the loop. 

for (i = 0; i < LlstboxCount; i++ ) 

// Use Presentation Manager application program 

// interface packing macros for the last 2 parameters. 

// The first is made of two numbers, 

// 

// MPFR0M2SH0RT( index of item, buffer size ) 
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// 
// 
// 
// 



The second is a pointer to the buffer 



MPFROMP( buffer ) 



WinSendiyisg( hwndListbox, 



LM_QUER Y I TEMTEXT , 
MPFR0M2SH0RT( i, BufSize ), 
MPFROMP( pBuffer ) ); 



// The text for one item is in the buffer now. It 
// should be copied to be saved somewhere. 



Example IV 



Example IV illustrates C programming language source code for defining the active-state vocabulary to 
the speech recognizer. 

An application program interface for the speech recognizer is used to set it up for recognition. A 
possible application program interface which can be used is the Speech Manager (trademark) application 
program interface that.comes with the IBM Speech Server Series (trademark) product. Source code for a 
similar application program interface will be shown below. 



ttinclude " 



smapi .h" 



// Speech Manager application 
// program interface headef file 



SmArg Arga'?9'*; 



// Local variable - array of 

// argxiraents used to initialize the 



28 



EP 0 621 531 A1 



// speech system. 

int iNumArgs ; 

, // Initialize the speech system. No parameters are used 
// 

SmOpen( 0, NULL ); 



// Set up the arguments to be used to make a connection. 
// The second parameter in the SmSetArg( ) function is the 
// name of the argument. The third parameter is the value 
// 

// Initialize for recognition. 
SmSetArg( Args^O", SmNrecognize, TRUE ); 

// This is the user ID. 
SmSetArg( Args-?3", SmNuserld, "User" ); 

// This is the user's trained statistics. 
SmSetArg( Argst4\ SmNenrollId, "Enroll ID" ); 

// This is the domain of text to be used. 
SmSetArg( Args^S", SmNtask, "Office System" ); 

// This is a previously created window 
// which will be used by the speech 
// recognizer to communicate with this 
// application. 
SmSetArg( Args^6", SraNwlndowHandle , hwndCommunication ); 

// This is an ID to identify messages 
// which come from the speech recognizer. 
SmSetArg( Argst7", SmNconnectionId, 27 ); 

// This is the application name. 
SmSetArg( Args^8'\ SmNapplicationName, "Patent Application" ); 



// Make a connection to the speech recognizer. The last 
// parameter to this function tells the speech recognizer 
// to make this call asynchronously , 

SmConnect( 9, Args, SmAsynchronous ); 
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// Now there is a connection with the speech recognizer. 
// The vocabulary created above can now be defined, 
// enabled, and used for recognition. 

// To define a vocabulary, SraDef ineVocab( ) is used. 
// During the define, the speech recognizer looks among a 
// large pool of words to find a speech model for the word. 
// If no speech model exists, one would have to be added 
// before the word can be used. For those that do exist, a 
// table is made, including only these, to be used for 
// recognition. 

// The arguments for SmDef ineV<Sdab( ) • ' 
// 

// "Active Vocabulary" - name to be associated with the 

// vocabulary 

// 35 - number of words in the 

// vocabulary 

// pWords - a pointer to an array of the 

// words in a form specified by 

// the application program 

// interface. 

// SraAsynchronous - make the call asynchronously 

SmDefineVocab( "Active Vocabulary", 35, pWords, 
SraAsynchronous ) ; 

// To enable the vocabulary for recognition, the 

// application program interface call, 

// SraEnableVocab( ) is used. 

// The arguments for SmEnableVocab( ) : 

// 

// "Active Vocabulary" - name of the vocabulary to 
// enable 

// SmAsynchronouB - make the call asynchronously 

SmEnableVocab( "Active Vocabulary" , SraAsynchronous ) ; 



// The system is now ready for recognition. To begin 
// recognizing, the microphone is turned on using 
// SraMicOnO, and a word is requested using 
// SmRecognizeNextWord( ) . Both calls are made 
// asynchronously here. 



SmMicOn( SmAsynchronous ); 
SmRecognizeNextWord( SmAsynchronous ); 
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Example V 

Example V illustrates C programming language source code for outputting a command signal cor- 
responding to the command model from an active-state vocabulary having the best match score. 
5 To begin, a list of commands and command-object associations is manually defined as described 
above. Each command, with the exception of the global commands, is associated with an object. 

Assume the word "RIGHT" from Table 1 is recognized. From the list of command-object associations, 
the target for the command is known. This target is designated hwndTarget in the example. 



10 HWND hwndTarget; 

The action defined by "RIGHT" for this target is to move the target to the right by a previously-defined 
increment, for example 10 picture elements (pels). 

15 #define INCREMENT_RIGHT 10 

The command is performed on the target using the OS/2 Presentation Manager application program 
interface call named WinSetWindowPosO- The current window position must be queried first so that the new 
position can be determined. 

20 

SWP swp; // Presentation Manager structure for 

// window position 

// Get the initial window position. 
// hwndTarget - target window or object 
// &swp - address where the target's window 

: features will be returned 

WinQueryWindowPos( hwndTarget, &swp ); 



// Execute the conunand, "RIGHT." 
// 

// hwndTarget - target window or object . 
// NULLHANDLE - unneeded parameter 
// swp.x + INCREMENT_RIGHT 

// - new x-coordinate for window 

// swp,y - use the same y-coordinate 

// 0, 0, - unneeded parameters 

// SWP_MOVE - tell the window to move 

// 

WinSetWindowPos( hwndTarget, 
NULLHANDLE, 

swp.x + INCREMENT_RIGHT, 
swp.y, 
0. 0, 

SWP_MOVE ); 

55 

Instead, assume the word, "ORANGE." is recognized. From the list of command-object associations, the 
target for the command is known. This is hwndTarget in the example. 
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HWND hwndTarget; 

The action defined by "ORANGE" for this target is to select the entry in the listbox. The command is 
performed on the target by sending a message, LM_SELECTITEM, to the listbox using the OS/2 
5 Presentation Manager application interface call named WinSendMsgQ. First the Index of the item has to be 
found. 



SHORT si tern; // item index for querying 



// Find the recognized word in the list. 
// 

// hwndTarget 

// LM_SEARCHSTRING 

// MPFR0M2 SHORT ( ) 

// LSS^PREFIX 
// 
// 

20 // LIT_FIRST 

// 

// MPFROMFO 

/ pListboxWord 

25 // 

sltein = (SHORT)WinSendMsg( hwndTarget, 

LM:lSEARCHSTRING, 

MPFR0M2SH0RT( LSS_PREFIX, 
30 niT^FIRST 

MPFR0MP( pListboxWord ) ); 



- target window or object 

- message being sent 

- Presentation Manager packing macro 

- ask for the item index which 
begins with the string in the next 
parameter 

- ask for the first item that 
matches" ~ 

- Presentation Manager packing macro 

- the recognized word "ORANGE" 
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// Select the recognized woxrd. 
// 
// 
// 
// 
// 
// 



hwndTarget 
LiW_SELECTITEM 
6 1 tern 
TRUE 



target window or object 
message being sent 
the item in the list to act upon 
select the item 



WinSendMsg( hwndTarget, 

LM„SELECTITEM, 
45 MPFR0MSH0RT( sitem ), 

MPFR0ML0NG{ TRUE ) ) ; 



50 Claims 

1. An interactive computer system comprising: 

a processor executing a target computer program having a series of active program states occurring 
over a series of time periods, said target computer program generating active state image data signals 
55 representing an active stale image for the active state of the target computer program occurring during 
each time period, each active state image containing one or more objects: 

means for displaying at least a first active-state Image for a first active state occun'ing during a first 
time period; 
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means for identifying at least one object displayed in the first active-state image, and for generating 
from the identified object a list of one or more first active-state commands identifying functions which 
can be performed in the first active state of the target computer program; 

means for storing a system vocabulary of acoustic command models, each acoustic command model 
5 representing one or more series of acoustic feature values representing an utterance of one or more 
words associated with the acoustic command model; 

means for identifying a first active-state vocabulary of acoustic command models for the first active 
state, the first active-state vocabulary comprising the acoustic command models from the system 
vocabulary representing the first active-state commands; -and 

70 a speech recognizer for measuring the value of at least one feature of an utterance during each of a 
series of successive time intervals within the first time period to produce a series of feature signals, 
said speech recognizer comparing the measured feature signals to each of the acoustic command 
models in the first active-state vocabulary to generate a match score for the utterance and each 
acoustic command model, and said speech recognizer outputting a command signal corresponding to 

75 the command model from the^irst active-state vocabulary.ltaving^e best match score. 

2. An interactive computer system as claimed in Claim 1, characterized in that: 

the first active-state vocabulary comprises substantially less than all of the acoustic command models 
from the system vocabulary; and 
20 the speech recognizer does not compare the measured feature signals for the first time period to any 
acoustic command model which is not in the first active-state vocabulary. 

a An interactive computer system as claimed in Claim 2, characterized in that: 

the display means displays at least a second active-state image different from the first active-state 
25 image for a second active state occurring during a second time period different from the first time 
period; 

the object identifying means identifies at least one object displayed in the second active-state image, 
and generates from the identified object a list of one or more second active-state commands identifying 
functions which can be performed in the second active state of the target computer program; 
30 the active-state vocabulary identifying means identifies a second active-state vocabulary of acoustic 
command models for the second active state, the second active-state vocabulary comprising the 
acoustic command models from the system vocabulary representing the second active-state com- 
mands, the second active-state vocabulary being at least partly different from the first active-state 
vocabulary; and 

35 the speech recognizer measures the value of at least one feature of an utterance during each of a 
series of successive time intervals within the second time period to produce a series of feature signals, 
said speech recognizer comparing the measured feature signals for the second time period to each of 
the acoustic command models in the second active-state vocabulary to generate a match score for the 
utterance and each acoustic command model, and said speech recognizer outputting a command 

40 signal corresponding to the command model from the second active-state vocabulary having the best 
match score. 

4. An interactive computer system as claimed in Claim 3, characterized in that the target computer 
program has only one active state occurring during each time period. 

45 

5. An interactive computer system as claimed in Claim 4, characterized in that the target computer 
program comprises an operating system program. 

6. An interactive computer system as claimed in Claim 5, characterized in that the target computer 
50 program comprises an application program and an operating system program. 

7. An interactive computer system as claimed in Claim 6. characterized in that the target computer 
program comprises two or more application programs and an operating system program. 

55 8. An interactive computer system as claimed in Claim 6, characterized in that at least some commands 
for an active-state identify functions which can be performed on the identified objects in the active-state 
image for the state. 
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9. An interactive computer system as claimed in Claim 8. characterized in that the identified object in an 
active-state image comprises one or more of a character, a word, an icon, a button, a scroll bar. a 
slider, a- list box, a rnenu,- a -check box, a container, or a notebook. 

5 10. An interactive computer system as claimed in Claim 9, characterized in that the speech recognizer 
outputs two or more command signals con'esponding to the command models from the active-state 
vocabulary having the best match scores for a given time period. 

11. An interactive computer system as claimed in Claim 10. characterized in that the vocabulary of acoustic 
10 command models for each active state further comprises a set of global acoustic command models 

representing global commands identifying functions which can be performed in each active state of the 
target computer program. 

12. An interactive computer system as claimed in Claim 11. characterized in that the display means 
J5 - comprises -a display 

13. An interactive computer system as claimed in Claim 11. characterized in that the display means 
displays both an active-state image for an active state occurring during a time period, and at least a 
portion of one or more images for program states not occurring during the time period. 

20 

14. A method, of computer interaction comprising: 

executing, on a processor, a target computer program having a series of active program states 
occurring over a series of time periods, said target computer program generating active state image 
data signals representing an active state Image for the active state of the target computer program 

25 occurring during each time period, each active state image containing one or more objects; 

displaying at least a first active-state image for a first active state occurring during a first time period; 
identifying at least one object displayed in the first active-state image, and generating from the 
identified object a list of one or more first active-state commands identifying functions which can be 
performed in the first active state of the target computer program; 

30 storing a system vocabulary of acoustic connmand models, each acoustic command model represent- 
ing one or more series of acoustic feature values representing an utterance of one or more words 
associated with the acoustic command model; 

identifying a first active-state vocabulary of acoustic command models for the first active state, the first 
active-state vocabulary comprising the acoustic command models from the system vocabulary repre- 
ss senting the first active-state commands; 

measuring the value of at least one feature of an utterance during each of a series of successive time 
intervals within the first time period to produce a series of feature signals; 

comparing the measured feature signals to each of the acoustic command models in the first active- 
state vocabulary too generate a match score for the utterance and each acoustic command model; and 
40 outputting a command signal corresponding to the command model from the first active-state vocabu- 
lary having the best match score. 

15. A method of computer interaction as claimed In Claim 14, 
characterized in that: 

45 the first active-state vocabulary comprises substantially less than all of the acoustic command models 
from the system vocabulary; and 

the step of comparing does not compare the measured feature signals for the first time period to any 
acoustic command mode! which is not in the first active-state vocabulary. 

50 16. A method of computer interaction as claimed in Claim 15, 
further comprising the steps of: 

displaying at least a second active-state image different from the first active-state image for a second 
active state occurring during a second time period different from the first time period; 
identifying at least one object displayed in the second active-state image, and generating from the 
55 identified object a list of one or more second active-state commands identifying functions which can be 
performed in the second active state of the target computer program; 

identifying a second active-state vocabulary of acoustic command models for the second active state, 
the second active-state vocabulary comprising the acoustic command models from the system 
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vocabulary representing the second active-state commands, the second active-state vocabulary being 
at least partly different from the first active-state vocabulary; 

measuring the value of at least one feature of an utterance during each of a series of successive time 
intervals within the second time period to produce a series of feature signals; 
5 comparing the measured feature signals for the second time period to each of the acoustic command 
models in the second active-state vocabulary to generate a match score for the utterance and each 
acoustic command model; and 

outputting a command signal corresponding to the command model from the second active-state 
vocabulary having the best match scorOi 

10 

17. A method of computer interaction as claimed in Claim 16, characterized in that the target computer 
program has only one active state occurring during each time period. 

18. A method of computer interaction as claimed in Claim 17, characterized in that the target computer 
;5 program comprises an-operating ^stem program^ 

19. A method of computer interaction as claimed in Claim 18, characterized in that the target computer 
program comprises an application program and an operating system program. 

20 20. A method of computer interaction as claimed in Claim 19, characterized in that the target computer 
program comprises two or more application programs and an operating system program. 

21. A method of computer interaction as claimed in Claim 19, characterized in that at least some 
commands for an active-state identify functions which can be performed on the identified objects in the 

25 active-state image for the state. 

22. A method of computer interaction as claimed jn Claim 21, characterized in that the identified object in 
an active-state image comprises one or more of a character, a word, an icon, a button, a scroll bar, a 
slider, a list box, a menu, a check box, a container, or a notebook. 

30 

23. A method of computer interaction as claimed in Claim 22, characterized in that the step of outputting a 
command signal comprises outputting two or more command signals corresponding to the command 
models from the active-state vocabulary having the best match scores for a given time period. 

35 24. A method of computer interaction as claimed in Claim 23, characterized in that the vocabulary of 
acoustic command models for each active state further comprises a set of global acoustic command 
models representing global commands identifying functions which can be perfomned in each active 
state of the target computer program. 

40 25. A method of computer interaction as claimed in Claim 24, further comprising the step of displaying 
both an active-state image for an active state occurring during a time period, and at least a portion of 
one or more images for program states not occurring during the time period. 
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